Text - Directed Speech Enhancement Employing

نویسنده

  • Bryan L. Pellom
چکیده

There are many situations where non-real-time speech enhancement is required. For such applications, employing any available a priori knowledge can lead to more eeective enhancement solutions. In this study, a novel text-directed speech enhancement algorithm is developed for usage in non-real-time applications. In our approach, the text of the intended dialogue is used to partition noisy speech into regions of broad phoneme classiications. Classes considered include stops, fricatives, aaricates, nasals, vowels, semivowels, diphthongs and silence. These partitions are then used to direct a new vector quantizer based enhancement scheme in which phone-class directed constraints are applied to improve speech quality. The proposed algorithm is evaluated using both objective as well as subjective quality assessment techniques. It is shown that the text-directed approach improves the quality of the degraded speech over a broad range of noise sources (i.e., at communications channel noise, aircraft cockpit noise, helicopter y-by noise, and automobile highway noise) and over a broad range of signal-to-noise ratios (i.e., 10, 5, 0, and ?5 dB). In each case, the proposed method is shown to consistently exhibit improved objective quality over linear and generalized spectral subtraction, as well as the Auto-LSP constrained iterative enhancement method using the Itakura-Saito measure and a 100-sentence evaluation speech corpus. Subjective quality assessment was conducted in the form of an A-B comparison test. Results of these evaluations demonstrate that, for wideband noise distortions, the proposed algorithm is preferred over the unprocessed noisy speech more than 2 to 1, while the proposed algorithm is preferred over spectral subtraction by more than 3 to 1. als 3 zu 1 bevorzugt wird.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Text-directed speech enhancement employing phone class parsing and feature map constrained vector quantization

There are many situations where non-real-time speech enhancement is required. For such applications, employing any available a priori knowledge can lead to more effective enhancement solutions. In this study, a novel text-directed speech enhancement algorithm is developed for usage in non-real-time applications. In our approach, the text of the intended dialogue is used to partition noisy speec...

متن کامل

Text-informed speech enhancement with deep neural networks

A speech signal captured by a distant microphone is generally contaminated by background noise, which severely degrades the audible quality and intelligibility of the observed signal. To resolve this issue, speech enhancement has been intensively studied. In this paper, we consider a text-informed speech enhancement, where the enhancement process is guided by the corresponding text information,...

متن کامل

Text-directed speech enhancement using phoneme classification and feature map constrained vector quantization

This paper presents and evaluates a novel text-directed speech enhancement algorithm for usage in non real-time applications. In our approach, the text of the intended dialogue is used to partition noisy speech into regions of broad phoneme classiications. Classes considered include stops, fricatives, aaricates, nasals, vowels, semivowels, diphthongs and silence. These partitions are then used ...

متن کامل

Integration of DNN based speech enhancement and ASR

Speech enhancement employing Deep Neural Networks (DNNs) is gaining strength as a data-driven alternative to classical Minimum Mean Square Error (MMSE) enhancement approaches. In the past, Observation Uncertainty approaches to integrate MMSE speech enhancement with Automatic Speech Recognition (ASR) have yielded good results as a lightweight alternative for robust ASR. In this paper we thus exp...

متن کامل

A Speech Enhancement Method Employing Sparse Representation of Power Spectral Density ⋆

A speech enhancement method employing sparse reconstruction of the power spectral density is proposed. The overcomplete dictionary of the power spectral density is learned by approximation K-singular value decomposition algorithm with non negative constraint. The power spectral density of clean speech signal is reconstructed by least angle regression method with a norm termination rule, and the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997